


Institutional Archive of the Naval Postgraduate School 


Calhoun: The NPS Institutional Archive 
DSpace Repository 


Theses and Dissertations 1. Thesis and Dissertation Collection, all items 


1996-09 


The effect of gender on attrition at the 
Defense Language Institute Foreign Language Center 


Arthur, George T. 


Monterey, California. Naval Postgraduate School 


http://ndl.handle.net/10945/32214 


Downloaded from NPS Archive: Calhoun 


Calhoun is the Naval Postgraduate School's public access digital repository for 


' (8 D U DLEY research materials and institutional publications created by the NPS community. 
: Calhoun is named for Professor of Mathematics Guy K. Calhoun, NPS'‘s first 
ath 
KNOX appointed — and published — scholarly author. 


i LIBRARY Dudley Knox Library / Naval Postgraduate School 


411 Dyer Road / 1 University Circle 
Monterey, California USA 93943 








http://www.nps.edu/library 





NAVAL POSTGRADUATE SCHOOL 
Monterey, California 





THESIS 


THE EFFECT OF GENDER ON ATTRITION AT 
THE DEFENSE LANGUAGE INSTITUTE 
FOREIGN LANGUAGE CENTER 










by 
George T. Arthur 


September, 1996 





Thesis Advisor: Lyn R. Whitaker 
sp proves for Be release; distribution is unlimited. 


+ 19970109 010 


DTIC QUALITY INSPECTED 1 — 

















| REPORT DOCUMENTATION PAGE | eer | 
0704 


ublic reporting burden for this collection of information is estimated to average 1 hour per response, including the time for 
reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and 
reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection 
of information, including suggestions for reducing this burden, to Washington headquarters Services, Directorate for 
Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the 
Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503. 


1. AGENCY USE ONLY (Leave blank) |2. REPORT DATE 3. REPORT TYPE AND DATES COVERED 
September 1996 Master's Thesis 


4. TITLE AND SUBTITLE. 5. FUNDING NUMBERS 
The Effect of Gender on Attrition at the Defense Language Institute 
Foreign Language Center. 


AUTHOR(S) Arthur, George T. 


. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 
Naval Postgraduate School 
Monterey CA 93943-5000 


. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 







































8. PERFORMING ORGANIZATION 
REPORT NUMBER 












10. SPONSORING/MONITORING 
AGENCY REPORT NUMBER 






11. SUPPLEMENTARY NOTES 
The views expressed in this thesis are those of the author and do not reflect the official policy or position of 
the Department of Defense or the U.S. Government. 





12b. DISTRIBUTION CODE 















2a. DISTRIBUTION/AVAILABILITY STATEMENT 
Approved for public release; distribution unlimited 






13. ABSTRACT (maximum 200 words) 
The Defense Language Institute Foreign Language Center (DLIFLC) , located at the Presidio of Monterey, 
alifornia, provides language training for Department of Defense military and civilian personnel. The Institute 
rains approximately 2,500 students annually, of which approximately 26 percent are female. Student attrition is 
costly feature of this training program. Females experience roughly a 7 percent higher rate of attrition than 
males at DLIFLC. The Institute is interested in knowing whether this difference indicates a gender bias, or 
hether it can be explained by other factors. This study investigates this question. Specifically, data on FY-95 
DLIFLC students are examined to determine factors with a significant impact on attrition, with particular 
emphasis on gender. Such information is potentially useful to the Institute for internal quality assurance efforts 
as well as part of potential cost saving measures. 


14. SUBJECT TERMS 
Gender, Attrition, Language Training, DLI 













15. NUMBER OF PAGES 
58 


16. PRICE CODE | 


20. LIMITATION OF 
ABSTRACT 




















19. SECURITY 
CLASSIFICATION OF 
ABSTRACT 


18. SECURITY 
CLASSIFICATION OF 
THIS PAGE 


17. SECURITY 
CLASSIFICATION OF 
REPORT 





























Unclassified Unclassified Unclassified UL 


rr a i i rsetiosisnsish oAlhiAidhishtsistk tS TST i nnn Tina Sr SSS SS eral © 


NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89) 
Prescribed by ANSI Std. 239-18 


_— 
— 











Approved for public release; distribution is unlimited. 


THE EFFECT OF GENDER ON ATTRITION 
AT THE DEFENSE LANGUAGE INSTITUTE 
FOREIGN LANGUAGE CENTER 


George T. Arthur 
Lieutenant, United States Navy 
B.S., United States Naval Academy, 1986 


Submitted in partial fulfillment 
of the requirements for the degree of 


MASTER OF SCIENCE IN OPERATIONS RESEARCH 
from the 


NAVAL POSTGRADUATE SCHOOL 
September 1996 





Author: 


A 
pproved by | FV Arar ar ee 
/ by R. Whitaker, Thesis Advisor 


teohen M. Payne, Second Reader 





Frank C. Petho, Chairman 
Department of Operations Research 











ABSTRACT 





The Defense Language Institute Foreign Language Center (DLIFLC), 
located at the Presidio of Monterey, California, provides language training for 
Department of Defense military and civilian personnel. The Institute trains 
approximately 2,500 students annually, of which approximately 26 percent are 
female. Student attrition is a costly feature of this training program. Females 
experience roughly a 7 percent higher rate of attrition than males at DLIFLC. 
The Institute is interested in knowing whether this difference indicates a gender 
bias, or whether it can be explained by other factors. This study investigates this 
question. Specifically, data on FY-95 DLIFLC students are examined to 
determine factors which have a significant impact on attrition, with particular 
emphasis on gender. Such information is useful to the Institute for internal 


quality assurance efforts as well as part of potential cost saving measures. 





VI 








TABLE OF CONTENTS 


INTRODUCTION a:isceeswus occu eteres ase auees ees se crantt aie atateee ats 1 
A. PROBLEM STATEMENT ciicioasaecushcjasesaeohs eueaanki qeekwratede 1 
B. LANGUAGE SKILL CHANGE PROJECT .................... 0202 e eee. 3 
C: THESIS ORGANIZATION: oi ccnbesassnicvkd ites tidied euemouardtewn 5 
AIAN: j2 aces os bet eh Be Sac ae ye wes ee eee Sea ieaats i 
Ao WA SOURCE. piswitecicctdnhd oheesk cts ieee deter ianeounoneaeus r 
Be WHE DATA. coietavtasss rece acta ne erecae fas mage onrea pease edie Ya 
CG. NMARIADGr So ayetroes test seacet eas co mnieass soe peeier yee aeeae ies 8 
1. Demographic Variables ..........00. 00. ccc cece eens 10 

2. Language Related Variables .............. 0... eee ee 16 

3. Test Score Variables .... 02.00. .c0ccc ci ea ences eeeen beens ene benes 19 
PINAL ONS syncs td acas ais cirri coun neha ed aera techeoesesiebeieest 27 
Pe Wiles: MODEL. 222..t0c525ediieean bait aeh iad oive stn aoe aceraaenrs ive 2/ 
Bi ANALYSIS: cs nateacidieretieaihate tenseene tensed angeescdaeas shoes 28 
Is. MOGE! KCAUCHONM:. s2cncoccuestoutwes em coenensseaudt mobereeawies 28 

Zi PAW OANA piieitsrs ight oun be pba eee Ne heehee Sa aba ah meant aes 29 

3: VVithout USMC Data? icin. is icdenishannedee ae int dass s det iaconecte ows 33 

4. Army Data Only ........ be Ste sitar di eat tec ee ig Ghee ee ae ae ee eee 30 
Sour Once ala OMIY: .A2eceecg ce esaeanntanes wen saren aes oe ees 36 

6. Gender As Response Variable .............. 0... cee 38 


VII 





IV. RESULFS/CONCLUSIONS: 6cscineenceved ost oiaennie paes ganeeeess 4’ 


BARE SULUS vers ceeneesieseesoeaterce ieee es sei ie eee 41 
B. CONCLUSIONS ic ncnticctivrred ei etete eid nike rndeeeeiteeee toe 43 
C. RECOMMENDATIONS FOR FURTHER STUDY ............-..-..--- 44 
MIST OF REFERENCES: sccctecuavecovdeneataindt idee hiemdebig 5 eee ees 45 
INITIAL DISTRIBUTION LISD vacccxed Sheng einen daciraeO ee ees wee tesem eee 4/ 


Vill 











EXECUTIVE SUMMARY 


The Defense Language Institute Foreign Language Center (DLIFLC), 
located at the Presidio of Monterey, California, provides language training for 
Department of Defense military and civilian personnel. The Institute trains 
approximately 2,500 students annually, of which approximately 26 percent are 
female. Student attrition is a costly feature of this training program. Females 
experience roughly a 7 percent higher rate of attrition than males at DLIFLC. 
The Institute has asked whether this difference is an indication of potential 
gender bias, or is it a function of other characteristics? This study investigates 
this question. 

The methodology used for this study involves fitting a logistic regression 
model with graduation/attrition as the response, and a variety of demographic, 
language specific, and test score variables as predictors. By analyzing variables 
with a significant effect on the model, it is possible to identify factors which 
contribute to student attrition, with particular emphasis on gender. 

Data are obtained from the combined Defense Language Institute Foreign 
Language Center - Defense Manpower Data Center data base, and include 
students scheduled to graduate in FY-95. There are 1,985 students in the data 
used for this study. 

Separate models are run on aggregate data and on individual service 
groups. For the aggregate data, the interaction between gender and service 
branch is a significant predictor of attrition. This is because, for Air Force 
students, gender itself is a significant predictor of attrition. Other attributes are 
different for Air Force students as well. The proportion of females for Air Force 
students is higher than for the other services. Also, a higher percentage of Air 
Force females are in the more difficult (Category IV) languages at DLIFLC. 


Finally, Air Force females are mostly in paygrades E-3 and below; students in 





these paygrades tend to be at a higher risk for attrition. Preliminary results show 
that the higher attrition statistics for females are not likely due to their gender; 
rather, females are over-represented in certain ‘high risk’ groups. 

In general, for all students, fanguage difficulty category and prior 
language experience tend to have the most impact on attrition, followed by 
certain demographic variables and test scores. Further study is suggested on 
the issues concerning Air Force students, and on the specific reasons why 
students fail to graduate (i.e., academic, administrative, etc.). 

The information gained from this study should assist the Institute with 
internal quality assurance measures, and provide it with a better understanding 


of the relationship between gender and attrition at DLIFLC. 











l. INTRODUCTION 


The Defense Language Institute Foreign Language Center (DLIFLC) is 
located at the United States Army Presidio of Monterey, California. The Institute 
is responsible for training military members from all four service branches, as 
well as civilian Federal employees, in a variety of missions requiring knowledge 
of a foreign language. The Institute produces approximately 2,500 graduates 


annually. (Directorate for Academic Administration, 1995) 
A. PROBLEM STATEMENT 
At the DLIFLC approximately 26 percent of the student population are 


female. DLIFLC FY-95 data indicate that the attrition rate among females is 
approximately 34 percent, while that of males is approximately 27 percent 
(Figure 1). By comparison, FY-95 Army-wide attrition for Initial Entry Trainees’ 
(IETs) is approximately 16 percent among females and 10 percent among males 
(Dove, 1996)’. Does the 7% difference in overall attrition for DLIFLC students 
indicate the existence of gender bias or is the difference a manifestation of other 
factors (e.g., a higher percentage of female students in more difficult curricula or 
a function of general differences in attrition among IETs in general)? Interest in 
gender-related attrition at DLIFLC goes back at least two decades; a 1975 point 
paper entitled Army Linguist Personnel Study (ALPS) cited attrition statistics 
which were remarkably similar to contemporary numbers, with overall female 
attrition of 34.6%, and overall male attrition of 27% (Rice, 1975). The Institute is 
interested in further exploration of these issues, and this study does so. The 


information provided by this study will assist the Institute with internal quality 


Initial Entry Trainees are those soldiers who have not yet completed their Basic and 


Advanced individual training. 
This study does not address the difference between DLIFLC attrition statistics and those 


of IETs in other training programs. Its focus is on attrition within DLIFLC. 


assurance efforts, as well as provide potentially useful information to the chain 
of command. 

While there is little background literature addressing the unique 
environment of military language training, the effect of gender on first language 
development is relatively well-documented. In general, females learn to talk and 
use sentences earlier than males, and are shown to use a greater variety of 
words (O'Mara, 1994). Furthermore, from about the sixth grade through college, 
females consistently outscore males on a variety of measures of verbal skiils 
(O'Mara, 1994). The exact reason for these differences is unknown. 
Neurological studies have shown, however, that there are physiological 
differences between the brains of males and females. These differences include 
the presence of more neurons and increased size in areas of the brain 
associated with language function. These physiological differences as well as 
the effects of differing cultural expectations are thought to be significant. 
(Begley, 1995) 


PERCENT ATTRITION 
100 


80 
60 


40 


20 





Oh] i i Oc 
Se ns ee e550 


FY 93 FY 94 FY 95 AGGREGATE 
CI] MALES FEMALES 


Figure 1. Percentage of male/female students who failed to graduate with their class. 








It is reasonable to assume that this advantage in aptitude among females 
would manifest itself in second language learning as well. This is an apparent 
contradiction to the attrition statistics shown in Figure 1. It is interesting to note 
that although overall attrition among females is higher than for males, academic 
attrition among females is approximately 15% lower than for males® (Figure 2). 
The 1975 ALPS study found a 9% lower academic attrition rate for females. This 
comparison suggests a possible explanation for the contradiction; 1e., it is 


possible that factors unrelated to academic performance 


PERCENT ATTRITION 
100 


80 
60 


40 


TOTAL ATTRITION ACADEMIC ATTRITION 
CJ MALES FEMALES 





Figure 2. Students scheduled to graduate in FY 95. Comparison of total attrition vs. 
percentage of non-graduates who attrited for academic reasons. 


may account for the higher overall attrition rate among females. This issue will 


be explored further in this study. 
B. LANGUAGE SKILL CHANGE PROJECT 

The results of a study similar to this thesis were released in August of 
1994. The study, entitled Language Skill Change Project (LSCP), was 


~ 
2 


Overall attrition refers to students who fail to graduate for any reason. Academic attrition 
refers to students who fail to graduate specifically due to academic performance. 





conducted by the DLIFLC Research and Analysis Division with the support of 
PRC, Inc., a civilian contractor. The LSCP reported no specific conclusions 
about the effect of gender on attrition, although gender was a sub-factor in a 
predictor block including various demographic variables. The predictor block 
including sex, level of education, and age was found to be collectively 
significant. (O'Mara, 1994) 

There are several key areas in which this study differs from the LSCP 
study. The first is scope. The main focus of the LSCP was to track changes in 
language proficiency (listening, reading, and speaking) over time. While 
language training attrition was addressed in the LSCP, it was not the primary 
emphasis, and was restricted to academic attrition (O'Mara, 1994). This study 
addresses language training attrition of all types, and language proficiency Is not 
addressed. 

The second area in which the two studies differ is in the subject 
population. The LSCP included only U.S. Army personnel who had, or were 
preparing for, military intelligence linguist occupational specialty codes, who 
were enrolled in either Spanish, German, Russian or Korean (one language in 
each of the four language difficulty categories). This study includes students in 
all branches of the military, and spans all applicable languages and language 
difficulty categories. While the LSCP was a longitudinal study, tracking students’ 
progress over a 3 to 4 year period, this study is a cross-sectional study, 
including those students who were scheduled to graduate during FY-95, and 
includes 1,985 subjects.“ 

The third major area in which the studies differ is in the data. Data used 


in the LSCP included information available in the subjects' records, as well as a 





: At the request of the Institute, the focus is on recent trends. FY 95 enrollees are chosen 
as this is the latest year for which complete data are available. Students who were 're-cycled’ 
from prior classes in the same language or who were transferred from other languages are 
excluded. Re-cycling is the process of removing a student from his/her current class, and 
starting them over in a later class in the same language. This can occur for many reasons, such 
as poor academic performance, medical problems, etc. 


4 











U 


series of special instruments used in assessing a variety of aptitudes, attitudes, 


motivational factors and personality-related characteristics (O'Mara, 1994). Data 
used in this study includes information available from current records, and does 
not incorporate any special testing instruments or surveys not normally 


administered to the language trainee population as a whole. 


C. THESIS ORGANIZATION 

Chapter II gives an overview of the data used to conduct this study. It 
contains an explanation of the data source, and the methods used to identify 
relevant variables. Variables selected for use in modeling are explained tn 
detail. Chapter Ill contains the bulk of the analysis. Preliminary data exploration 
is conducted on the variables selected in Chapter Il. An explanation of the 
logistic regression model used in this study and its results are provided in 
Chapter III. Chapter IV summarizes final results, and provides conclusions and 


recommendations for further study. 








ll. DATA 


The data gathered for this study are used in two stages. First, preliminary 
analysis is performed on each variable to determine which variables are suitable 
for inclusion as potential predictors of attrition. Second, variables identified in 
the first stage for inclusion are used to construct a regression model of attrition. 
Of particular interest is whether gender is a significant predictor of attrition. The 
preliminary analysis and selection process are discussed in this chapter, and 


further analysis stemming from the regression model is found in Chapter Ill. 
A. THE SOURCE 
This study is being conducted with the cooperation of the DLIFLC 


Research and Analysis Division and the Command Historian. Data are drawn 
from the combined DLIFLC - Defense Manpower Data Center (DMDC) Student 
Database (S3D). S3D represents a comprehensive aggregation of data 
elements extracted from DLIFLC's Student Data Base and DMDC's Active, Loss, 
Reserve and Civilian files. These files are large, containing thousands of 
records (one per individual) with over 350 data fields per record. concatenated 
by the students’ social security number and updated quarterly. (Shaw, et al, 
1994) 


B. THE DATA 


At the request of the Institute, emphasis is placed on recent trends. This 
is done to capture the effects of contemporary policies at the Institute, without 
consideration of changing effects over time. Therefore, this study concentrates 
on students who were scheduled to graduate during FY 95 because this is the 
most recent full year for which data are available. Students eligible for 
consideration are those considered as new Inputs. This criterion eliminates 
students who were in intermediate or advanced classes, as well as those who 


were transferred from other languages or re-cycled from earlier classes in the 





same language. The rationale for this criterion is two-fold: 1) the excluded 
subjects are not considered typical of the student population at large, and 
therefore could introduce confounding effects in the analysis, and 2) the 
excluded subjects represent less than 10 percent of the target population and 
therefore do not constitute a significant portion of the population. All students 
who meet the above criteria are included in the data, resulting in 1,985 
observations. The data includes students from each of the four language 


difficulty categories, and spans all four branches of the service. 


C. VARIABLES 

Each record in the database has 352 variables. Through in-depth 
consultation with subject matter experts at the Institute, 43 of these variables are 
identified as potential candidates for inclusion, and are defined in Table 1. 
Redundant variables are excluded, as well as those which clearly have no 
relevance to the question of attrition. 

To simplify the modeling effort, it is necessary to further refine the set of 
candidate predictor variables. For each variable, the decision is to either 
exclude it, use it in its current form, or use it as a basis for some new 
transformed variable. 

The binary response variable indicating graduation or attrition 
(GRAD/ATTR) is constructed from the variables output status (OUT) and reason 
for output (REASON). This is done by evaluating the output status and reason 
codes and determining whether a particular student successfully completed 
his/her curriculum on time. If so, they are labeled a graduate, otherwise they are 
placed into the attrition category. 

The explanatory variables fall loosely into three categories: 1) 
demographic variables, 2) variables associated with the language studied at 
DLIFLC or prior language experience, and 3) variables associated with test 


results measuring learning aptitude or demonstrated ability. 











Oot |sudenoupacatogoy romne 
REASON [reason forinoroutofiass Cinna 
SEX loenier Sidon 
PAYGRD [payoedeSSS~S~sd mn 
EDUYR NA 


RRY|martalsas SSC 
008 (dateofbitn——SSSSSSminel «NA 
SERV [seve rina i 
NC race etic mined 
NGTH [length of couse weeks) —_———SS—irominal =A 

PRILANG [pforlanguage code nominal 
THER [naive ofoerlanguage nominal 
RPROF [proficiency of piorlanguage———otnal 
experience ofprorlanguage Cortina 
cna 
3 : 


2 
-) 


ii 


ine 


NCAT [language category 4 
GPA grade point average (dlific) continuous N/A 
DLPTL __|Defense Language Proficiency Test score (listening) N/A 


DLPTR Defense Language Proficiency Test score (reading) N/A 
Defense Language Proficiency Test score(speaking) NWA 


Defense Language Aptitude Battery Test score continuous == [NA sid 
Armed Forces Qualification Test score 


TESTV Ammed Forces Qualification Test form version 


nominal INA 
ASVFM Armed Services Vocational Aptitude Battery test nominal WA 
form version 
GS Armed Services Vocational Aptitude Battery test - continuous NWA 
general science 
AR Armed Services Vocational Aptitude Battery test- | continuous N/A 
arithmetic reasoning 


Armed Services Vocational Aptitude Battery test- | continuous N/A 
word knowledge 


+ 
Y) 


aap and 








il 





AFQT 





5 









Nee ee Te i SSS 1 StS /aSSSSissvGsSh # <ssssrestes tosses sso 


_ ARIABLE eee ae is 
| Armed Services Vocational Armed Services Vocational Aptitude Battery test- Battery test - | continuous | | 
paragraph comprehension 
Armed Services Vocational Aptitude Battery test- | continuous 
numeric a 









Table 1. Variables downloaded from data base. 


4. Demographic Variables 

The following variables are related to demographics: gender, social 
security number, paygrade, years of service, years of prior education, marital 
status, motivation, age, branch of service, and ethnic background. The binary 
predictor variable describing gender (SEX) is included because this ts the 
primary predictor of interest. As shown in Chapter |, Figure 1, there appears to 
be increased attrition among female students. The nominal! variable listing a 
student's social security number (SSN) is excluded as this information is used 
for data management and has no impact on attrition. 

The categorical variable indicating an observation's military paygrade 
(PAYGRD) contains 20 levels. Some of these levels have very few 
observations. For example, W-5 has only one observation. PAYGRD Is 
therefore transformed into a continuous variable (PAYGRD2) in the following 
manner: each level of PAYGRD (E-1 through O-6) is arranged in increasing 
order, then is coded numerically. E-1 is assigned as '1', E-2 as '2' and so forth 


ending with O-6 assigned as '20'. PAYGRD is used as the basis for another 


10 














categorical variable, indicating whether an observation is an officer or is enlisted 
(OFF/ENL). This variable contains two levels and is formed by assigning all 
observations with paygrade E-9 and below to the enlisted category and all 
others to the officer category. This variable is designed to detect any possible 
differences between officers and enlisted students with respect to aitrition. 
PAYGRD2 and OFF/ENL are included in the data set. There appears to be a 
decreasing and then increasing rate of attrition among enlisted students as they 
become more senior in paygrade. A similar relationship exists among 
commissioned officers. There is no clear trend among warrant officers. The 
relationship between paygrade and attrition is depicted in Figure 3. It Is 
interesting to note that a vast majority of students come from lower (E3 and 


below) paygrades (Figure 4.). 


PERCENT ATTRITION 
100 


80 


60 


. 
ee 
~ 
es 
te. 
x 
ae 
ree 
. fore 
on 
% 
Bes 
ax 
ie 
en 
ate 
. 
ee 
late, 
ama. 
oe 
: 
BS - ses 
oS Me 
3 505 are 
- 24 MS 
a" «, 
he Ses eee 
Se nn ce 
se, “a cn 
a te ate 
+, +. 
RS mo an 
moe a Re 
mS me oes 
RS Be os 
x ss cs 
re" ‘ore, fae 
ie a Pw 
9 ou ee 


E1 —€2 £&3 E4 E—5 E6 E7 ES W1 W2 W3 WS 01 02 O03 04 O5 O06 
PAYGRADE 


CI} MALES FEMALES M@ OVERALL 
FEMALE/E8, W1, W3, W5, O05, O6 HAD NO OBSERVATIONS 


Figure 3. Percentage of attrition vs. paygrade. 


Several of the predictor variables provide age type information. One of 
them is the continuous variable indicating years of military service (YRSRV). 
Although YRSRV may be redundant with PAYGRADE or other such variables, 
they are included in the study. In the case of YRSRV, the majority of 


11 





observations (67%) have less than two years of service. For graphical purposes 


the observations are separated into those with fewer than two years of service 


PERCENTAGE OF STUDENTS 
100 


80 
60 
40 


20 





0 SS] BS | Bell bs CBS ops oe OBS Ed be pl US = ES SS oe BS ES 
E1 E2 £3 E4 E5 E6 E7 E8 W1 W2 W3 W5 O01 02 O03 04 O05 O 
PAYGRADE 


CL] PERCENT CUMULATIVE PERCENT 


Figure 4. Paygrade distribution of subject data. 


and those with two or more years of service. There appears to be a higher rate 


of attrition among observations in the former group, as depicted in Figure 5. 


PERCENT ATTRITION 
100 


80 
60 


40 


LESS THAN TWO TWO OR MORE 
YEARS SERVICE 


Cl MALES FEMALES Mf OVERALL 





Figure 5. Percentage of attrition vs. years of military service. 


4 2 


— ta 





In the data set used for this study, there are some occurrences of missing 
values. In the case of the continuous variable indicating years of education 
(EDUYR), approximately 20 percent of the observations have missing values. A 
common attribute of the missing data for this variable is that they are all attrites. 
There is no clear reason for this; it would be useful for future research purposes 
to determine the cause of this situation, and correct the data collection 
procedures, if necessary. Care needs to be exercised in the handling of missing 
values. If an observation has a missing value for any of its variables, that 
observation is usually excluded from analysis. To prevent the complete 
exclusion of observations with missing values for EDUYR, this variable is 
transformed from continuous to nominal. A new variable, EDUYRgroup, is 
formed by including all observations with missing values in one level (N/A), all 
observations with no more than a high school education in another level (HS), 
and all observations with some college in a third level (HS+). Thus, 
EDUYRgroup its included in the data set to detect possible effects of quantity of 
prior education on attrition. From Figure 6, students with some college have a 


lower percentage of attrition. 


PERCENT ATTRITION 
100 


80 
60 
40 


20 





N/A Hs HS+ 
PRIOR EDUCATION GROUP 


C) MALE FEMALE Ml OVERALL 


Figure 6. Percentage of attrition vs. prior education. 


13 





The binary variable indicating marital status (MARRY) is included to 
explore the possible effects of marital status on attrition. Overall, married 
students seem to have a lower percentage of attrition than single students. 
However, married females appear to experience a higher percentage of attrition 
than single females. Figure 7 shows the relationship between marital status and 


attrition. 


PERCENT ATTRITION 
| 100 


80 
60 
40 — : 


20 





MARRIED SINGLE 
MARITAL STATUS 


CJ MALE FEMALE @ OVERALL 


Figure 7. Percentage of attrition vs. marital status. 


The ordinal variable describing a student's motivation to study the 
assigned language (MOTIV) contains 5 levels. They are self-evaluated by the 
student, and range from 1 (least motivated) to 5 (most motivated). This variable 
is included to examine the effects of motivation on attrition. From Figure 8, after 
level 2, there is a steady decline in percentage of attrition as motivation 


increases. 
The variable indicating date of birth (DOB) was transformed into the 


variable AGE by computing a subject's age as of O1JAN95. AGE is included in 
the predictor set. For graphical purposes, AGE is broken into four age groups. 


From Figure 9 it appears that the percentage of attrition generally decreases 


14 








with age. Thus AGE is treated as a continuous variable rather than as a 


categorical variable. 


PERCENT ATTRITION 
100 


80 


60 
40 


20 


MOTIVATION 
CI MALE & FEMALE Mf OVERALL 


1 = LEAST MOTIVATED ...5 = MOST MOTIVATED 
FEMALE/MOTIV 1 = NO OBSERVATIONS 





Figure 8. Percentage of attrition vs. motivation. 


PERCENT ATTRITION 
100 


80 
60 


40 


25-30 31-35 
AGEGROUP 
C1 MALES &3 FEMALES Ml OVERALL 





Figure 9. Percentage of attrition vs. age. 


The categorical variable indicating which branch of service a student was 


in (SERV) is included to pick up any relationship between service component 


15 





and attrition. From Figure 10, Army students had the highest overall attrition 
(36%) while Navy students had the lowest overall attrition (23%). The fact that 


female Marines experienced 60% attrition is potentially significant. 





PERCENT ATTRITION 


100 
80 | 


USAF USMC 
BRANCH OF SERVICE 
[) MALE ES FEMALE Mf OVERALL 





Figure 10. Percentage of attrition vs. branch of service. 


The categorical variable describing a student's ethnic group (ETHNIC) is 
included to determine any effects of ethnic background on attrition. From Figure 
11, there is wide variation in attrition percentage across different groups, ranging 
from a high of 57% overall attrition for those observations listed as 
‘'unknown/none’, to a low of 17% overall attrition for Hispanics. 

2. Language Related Variables 

The following variables are related to a student's language training and 
experience, both prior to and at DLIFLC: language category, language 
identification code, course length, prior language category, prior language 
experience level, prior language source, prior language proficiency, and whether 
Qa student is a native English speaker or of some other language. 

The ordinal categorical variable indicating a student's language category 


(LANCAT) has four levels: |, Il, Ill, IV. These levels indicate, in increasing order, 


16 








the relative difficulty of a student's particular language curriculum in accordance 


with established guidelines at the Institute. This variable is included to show 


PERCENT ATTRITION 
100 


80 
60 
40 


20 





ETHNICITY 
C} MALES ES FEMALES Mf OVERALL 


Figure 11. Percentage of attrition vs. ethnic group. 


the effects of language difficulty on attrition. As shown in Figure 12, the two 
most difficult levels have a greater percentage of attrition. 

The nominal variable indicating a student's language identification code 
(LID) specifies a unique code for each particular language curriculum. For the 
data in this study, this variable has 22 levels, some with too few observations to 
be useful. For example, Greek has only 3 observations. Since LID is a subset 
of LANCAT, and LANCAT contains the desired information (i.e., relative 
difficulty) LID is excluded in favor of LANCAT. The advantage of using LANCAT 
instead of LID is that it allows for the pooling of LID categories with relatively few 
observations into their respective language categories. The variable indicating a 
student's curriculum length in weeks (LENGTH) is excluded. This is because 
LENGTH varies as a function of language difficulty, and therefore the 
information provided by LENGTH is reflected in LANCAT. 


17 





The nominal variable indicating prior language experience is called prior 
language code (PRILANG). This variable is coded the same as LID, and for this 


data set has 46 levels. This variable is used as the basis for another variable, 


PERCENT ATTRITION 
100 5 





| I i IV 
LANGUAGE CATEGORY 
C] MALE & FEMALE M@@ OVERALL 


Figure 12. Percentage of attrition vs. language category. 


prior language category (PRILANCAT). PRILANCAT is computed in the exact 
manner as LANCAT, by assigning each observation with prior language 
experience to its associated relative difficulty category. The variable 
PRILANCAT is included in favor of PRILANG for the same reasons that LANCAT 
is preferred over LID. An additional benefit of including PRILANCAT is that it Is 
directly comparable to LANCAT. From Figure 13, students with no prior 
language experience have higher probabilities of attrition, second only to 
students with prior experience in category IV languages. Of students with prior 
language experience, there is an increased percentage of attrition among 
PRILANCAT IV students. Nominal variables indicating prior language 
experience level, prior language source, prior language proficiency, and whether 
a student is a native English speaker or of some other language (PREXP, 


PRSRC, PRPROF, NATENG, OTHER) are excluded. This is done because the 


18 








desired information (i.e., relative difficulty of prior language, if any) is contained 
in the variable PRILANCAT. 

3. Test Score Variables 

The following variables are related to aptitude or performance measures: 
Armed Forces Vocational Aptitude Battery, Armed Forces Qualification Test, 
Defense Language Aptitude Battery, Defense Language Proficiency Tests, test 
form versions, and grade point average. The first three, Armed Services 


Vocational Aptitude Battery (ASVAB), Armed Forces Qualification Test (AFQT), 


PERCENT ATTRITION 
100 


80 
60 
40 - 


20 





0 | I] Hi IV 
PRIOR LANGUAGE CATEGORY 


C] MALE FEMALE Mf OVERALL 


Figure 13. Percentage of attrition vs. prior language category. 


and Defense Language Aptitude Battery (DLAB) are important to this study. The 
ASVABs are a battery of 10 tests administered to potential recruits measuring 
such skills as general science, paragraph comprehension, and mathematics 
knowledge. A complete listing of these sub tests is located in Table 1. The 
AFQTs are a composite measure formed from the ASVABs. The DLAB test is a 
specific measure of language learning aptitude, administered to language 


training candidates. The continuous variable DLAB is included to capture the 


19 





effects of language learning aptitude on attrition. From Figure 14, there is a 
generally decreasing percentage of attrition as DLAB scores increase. 

Many of the ASVAB sub tests measure similar types of aptitude. This 
redundancy in the tests can result in multicollinearity of the test scores. To 
guard against multicollinearity, and to potentially reduce the number of predictor 
variables, the method of principle components is used. Principle components ts 
a technique that results in orthogonal linear combinations of the predictor 
variables (or standardized versions of the predictor variables). The first principle 


component is the linear combination of the predictor variables that has the 


PERCENT ATTRITION 
100 


80 
60 
40 


20 





DLAB QUARTILE 
CO MALE & FEMALE Mf OVERALL 


Figure 14. Percentage of attrition vs. DLAB. 


greatest variance among all linear combinations of the predictor variables. The 
second principle component is the linear combination of predictor variables that 
has the greatest variance among all those linear combinations that are 
orthogonal to the first, and so on. The principle components are derived from an 
eigenvalue decomposition of the correlation matrix for the standardized 
variables, or the covariance matrix for the original variables. For variables that 
are measured on dissimilar scales it is important to perform principle 


components on standardized variables. Since ASVAB test scores are 


20 











standardized, principle components on the original and standardized variables 
yield similar results. (Hamilton, 1992) 

Let x; x2...,x, represent the n x 7 vectors of scores for each of the k tests, 
where n = 1,985 observations and k = 10 tests. The corresponding vectors of 
standardized test scores z; 22 ...,z, are defined as : 


z= Pe (x; —x,;-1) forj=1,..., k (2.1) 


where x; is the average over all observations for the j” test, s; is the standard 
deviation for the 7” test, and 1 represents the n x 7 vector of ones required to 


make the vectors conformable. The first principle component of the correlation 


matrix Is : 
O1Z) + eZ. +... eZ , (2.2) 


where (a1, &2, ...,&x) is the first eigenvector of the correlation matrix and the a’s 
are the loadings of each of the vectors of standardized variables. With subtest 
abbreviations as subscripts, values for ags, O4r Owx, Orc, Ono, Acs, LAs, OK, Oc, bz 
respectively, are: (.34, .31, .34, .34, .31, .28, .28, .33, .31, .31). As shown in 
Figure 15, the first principle component accounts for approximately 68 percent of 


the variation in the ASVAB test scores. 


CUMULATIVE PERCENTAGE 
100 


80 


60 


40 


4 5 6 7 
PRINCIPLE COMPONENT 


CUMULATIVE PERCENTAGE 
—ii— 





Figure 15. Cumulative percentage of variation in ASVAB test scores attributed to each 
principle component. 


21 


Equation (2.2) can be translated into the original test scores by replacing the 


standardized variables with the original variables, giving: 
a k Oy 
aX] + =X? aes +Bu-($ 3, -1. (2.3) 


Thus, the first principle component corresponds to a weighted average of the 
original variables, where the weights are the loadings divided by the standard 
deviation of that variable. As shown in Figure 16, the loadings and standard 


deviations are about the same for each of the variables. 





LOADINGS STANDARD DEVIATIONS 
1 20 
0.8 45 
0.6 
10 
0.4 
0.2 : 
0 : ! : 0 
GS AR WK PC NO CS AS MK McC El 
ASVAB TEST 
LOADING STANDARD DEVIATION 
—_—_i— —&— 


Figure 16. Loadings and standard deviations for each subtest in the first principle 
component of ASVAB scores. 

The fact that the first principle component accounts for most of the 
variation in the test scores, and that the loadings for each of the factors in that 
principle component are about equal, means that an average of the test scores 
(weighting each test equally) accounts for the bulk of the variation in the ASVAB 


scores. Thus, a new variable, ASVABavg, was computed and is included in 


22 











favor of the subtest scores. The net result of the principle component analysis is 
the reduction in the dimension of the ASVAB scores from ten to one. 

The ASVAB and AFQT scores are other cases where there are a 
significant number of missing values. For these variables, approximately 30 
percent of the observations have missing values. Approximately 40 percent of 
these missing values are attributed to the subjects being officers, because 
officers do not routinely take ASVAB tests. The remainder of the missing values 
for these variables are unexplained, but appear to be equally distributed among 
the other variables and have no other common attributes. As in the case of 
EDUYR, there is a concern over the handling of observations with missing 
values. Left uncorrected, this situation would lead to the exclusion of all! officers 
and about 21% of enlisted observations. 

To prevent the complete exclusion of observations with missing values for 
ASVABavg and AFQTavg, these variables are transformed from continuous to 
ordinal variables. Each observation is separated into its appropriate quartile, 
producing four categories. Then, the observations with missing values are 
placed into a fifth category. Thus, the variables ASVABatiles and AFQTatiles 
are included in favor of ASVABavg and AFQTavg. In this manner, observations 
with missing values for ASVABavg and AFQTavg can be included in the analysis 
across the entire range of predictors. As depicted in Figures 17 and 18, there is 
a generally decreasing percentage of attrition as test scores increase. 

Variables indicating ASVAB and AFQT test versions (ASVFM and 
TESTV, respectively) are excluded, since these test scores are standardized 
and are therefore comparable without regard to test version. 

Upon successful completion of study ait the Institute, students are 
administered the Defense Language Proficiency Tests - Listening, Reading, and 
Speaking (DLPTL, DLPTR, DLPTS). Variables listing scores for the DLPTs are 


23 


PERCENT ATTRITION 
100 


80 
60 


40 


LOWER 2 
ASVAB QUARTILE 


Cl MALES FEMALES MB OVERALL 
N/A DENOTES MISSING VALUE 





Figure 17. Percentage of attrition vs. ASVAB test scores. 


PERCENT ATTRITION 
100 


80 


60 






40 39 — 39 


20 





N/A LOWER 2 3 UPPER 
AFQT QUARTILE 


CJ MALES FEMALES M— OVERALL 


N/A DENOTES MISSING VALUE 


Figure 18. Percentage of attrition vs. AFQT test scores. 


excluded as they are not available for students who attrite. Similarly, the 
variable listing a student's grade point average while at the Institute (GPA) Is 
excluded since it is only recorded upon successful completion of the program. 


After undergoing the preceding preliminary analysis, the data set includes 


24 








the binary response variable GRAD/ATTR and the fifteen predictor variables 
listed in Table 2. These variables are used in modeling and further analysis of 


attrition. Development of a regression model of attrition is found in Chapter III. 


VARIABLE DESCRIPTION tyYPE LEVELS ””~»~«:| 


eT | NT FL LL 


LANCAT LANCAT _|language difficulty category ia It, IM, MV 


defense language aptitude battery test 
scores 


YRSRV years of military servos 
MARRY mantai status nominal 


MOTIV level of motivation. self evaluated by 1. least motivated 
student Liars 
“4 


















most motivated 


PRILANCAT prior language category. difficulty level LU, ll, WV 
of prior language, if any. compatible 
with LANCAT. 


AGE [age as of O1JANS continous [NA 


ETHNIC ethnic category 0. unknown/none 
1. white 
2. black 
3. hispanic 
4. amer. indian/alaskan 
5. asian/pacific islander 
6. other 


ASVABatile armed services vocational aptitude ordinal 0. missing value 
battery test score quartile 1. lower quartile 
2. second quartile 
3. third quartile 
4. upper quartile 
0. missing value 
1. lower quartile 
2. second quartile 
3. third quartile 
4. upper quartile 


EDUVRap _[ihestyearofeanatin compiled [novne! (NAS. He 
SERVICE [branch ofserviee__—_—_—_——‘(rominal”—_~[USA, USAF, USN, USMC 
PAYGRADED [miley payorade——~—S—sdconducuset=1,..,05=20 
OFFeN, ees rot os 
ee ee 


Table 2. Variables selected for use in analysis. 
























AFQTatile armed forces qualification test 


(composite of ASVAB) quartile 





25 


26 





lll. ANALYSIS 


This chapter gives the details of the analysis performed on the data set 
and variables developed in Chapter II. The objective is to identify factors which 
have a significant impact on attrition (‘significant' can mean either a positive or 
negative impact) with particular interest in those variables involving gender. The 
methodology involves developing a model of attrition, and further analyzing 


those variables which contribute significantly to the model. 


A. THE MODEL 

The data, prepared for analysis in Chapter Il, include: a binary response 
variable (graduation/attrition) and a set of 15 predictor variables, which are a 
mixture of continuous and categorical variables (Chapter Il, Table 2). Among the 
most common models considered appropriate for binary response variables are 
logit and probit. The logit model is used since the results from the two models 
are typically comparable, and the logit model is computationally easier to work 
with. (Collett, 1991) 


Logistic regression fits binary response variables (Y) to a function of 
predictor variables 14,X), ...,X, in such a way that E[Y], or equivalently, 
Pr(Y=1) is between 0 and 1. Specifically, it fits the logit of Pr(Y=1) as a linear 


function of the predictors ),X3, ...,.X, as follows: 


log (Ea=0 ) Oe ee Ge aS ee ee (3.1) 
or equivalently 


a = l 
Pr (Y= I) -_ 1+ exp{Bo+BiX1+...+BpXp} : (3.2) 


2/ 





where Bo, ..., Bp are unknown parameters (Collett, 1991). In equations (3.1) 
and (3.2), Y =1 represents graduation and Y=0 represents attrition. Parameter 


estimates are obtained through the method of maximum likelihood, for which the 
logistic model has no closed form. Iterative numerical solutions are required; the 
most commonly used is Newton's method. Once the model is fitted, likelinood 
ratio tests are used to test the significance of the model as a whole and to 
eliminate variables which are redundant or do not have predictive ability. 
(Agresti, 1990) 

B. ANALYSIS 


Generally, there are two types of information which can be derived from 
any regression model. First, there is the ability of the model to predict changes 
in the response with respect to changes in predictor variables. Second, 
important insight into the question of interest may be obtained from the structure 
of the model itself; i.e., which predictors or combination of predictors seem to 
have a significant impact on the response. In the case of logistic regression, 
predictive ability is often limited (Hamilton, 1992). The typically low predictive 
power of logistic regression models is not a concern here, since the purpose of 
this study is not to predict who will attrite, but to compare attrition results 
between males and females. 

1. Model Reduction 

The first goal in arriving at a suitable model is to find a combination of 
predictors which capture the features of interest, yet is parsimonious. To reduce 
the number of predictors, a backwards elimination procedure is used. The first 
model is fit including all 15 main effects, and all of the two-way interaction terms 
between them (120 terms in all). Then, subsets of predictor variables which 
show the least significance are removed and the model is run again. This 
iterative procedure is continued until a satisfactory model is obtained with a 


balance of descriptive (not necessarily predictive) usefulness and simplicity. 


28 








Arrival at this satisfactory model is a matter of analyst judgment based on 
hypothesis testing. A comparison is made between the current (reduced) model 
and the one prior to it to test whether there iS a significant difference between 
them. (Agresti, 1990) 


The hypothesis test is performed as follows: let Model(i) represent the 
model under consideration in the i” iteration of backwards elimination. Test 
the null hypothesis (Ho) that Model(i) is true versus the alternative hypothesis 


(Ha) that Mode/(i-7) is true. Note that under backwards elimination Modei(i) 
contains fewer terms than Model(i-7). Then the likelinood ratio test statistic (7) 
is two times the difference of the log likelihood under Model(i) and the log 
likelihood under Model(i-7). The null distribution of 7 is approximately 
Chi-Squared with k degrees of freedom; k is the difference between the number 
of parameters in Model(i) and Model(i-1). Large values of 7 indicate that the null 
hypothesis (Ho) should be rejected in favor of the alternative hypothesis (Ha); 
i.e., the model cannot be reduced by eliminating the variables chosen in the 
current iteration. Equivalently, if the p-value (the largest level of significance for 
which the test statistic causes rejection of Ho) is small, then Ho is rejected. If 
there is a significant difference between the models, then some or all of the 
removed effects should remain in the model. Main effects, regardless of 
significance, are left in the model if they are part of a significant interaction term. 
When no more effects can be removed from the model without a significant 
change, the current model is one which is as small (with respect to the number 
of predictor variables) as possible, and inferences can be made about the 
significance of the remaining predictors. | 

2. All Data 

In all, 76 iterations were performed on the full data set. The final model 
includes 40 terms, of which 25 are significant (at a 0.10 level of significance). 
The uncertainty coefficient (U = 0.2947) indicates limited predictive power, as 


expected. The 'uncertainty coefficient’ (U) is a statistic analogous to the familiar 


29 





R-squared, and its purpose is to describe the level of predictive utility in the 


mode]. It is computed as follows: 


[-LogLikelihood(const model) - {—LogLikelihood(fit model) } ] 


C= —LogLikelihood(const model) | 


(3.3) 


where the constant model is fit including only the intercept term. Table 3 lists 
significant terms in the final model, in order of decreasing significance. The 


p-values associated with the likelinood ratio test of the model excluding each 


variable, one at a time, are given in Table 3. 


| | | | [P-value | 
oe 
SERVICE*ASVABqtiles LANCAT 0.0080 


SERVIC 
LANCAT*AFQTatiles 0.0001 |MARRY 0.0120 











SERVICE*AFQTatiles 0.0010|AGE*AFQTatiles 0.0166 
PRILANCAT*ASVABdtiles 0.0024|LANCAT*AGE 0.0258 





YRSRV*AGE 0.0265 
LANCAT*PAYGRADE2 0.0274 
0.0342 
YRERVEDUYRooup | ones|motv___—+(|—~S~=iaaae 


Table 3. Significant terms in the final model, in order of decreasing significance. 






Once the final model is developed, consisting of first order main effects 
and two-way interactions, further analysis is conducted to assure that the 
continuous main effects are of the proper form. Specifically, it is important to 
verify that the logit of the probability of graduation is linear in each continuous 
main effect and that transformations or re-parameterizations of the continuous 


main effects do not provide a better fit. Partial residuals are plotted against each 


30 





continuous main effect. If the resulting plots are approximately linear, then 
higher order terms are not indicated. Partial residuals, PR» , are computed for 
each of the j= 1,..., 1985 observations and k = 1,.., 4 continuous main effects 


(PAYGRADE2, DLAB, AGE, YRSERV respectively), as follows: 


Y;-P; 
P, © (1-P;) 


PR = + BeXie ; (3.4) 





where: 


Y; = response (graduation/attrition) for the 7” observation, 

P; = estimated probability of graduation for the i” observation, 

By = parameter estimate for the £” continuous main effect, and 
Xx = value of the k” continuous main effect for the i” observation. 


(Collett, 19971). 
From Figure 19, the plots of the partial residuals against DLAB, AGE, and 


YRSERV are quite linear, confirming that higher order terms are not required. 


1 20 
YRSERV 





Figure 19. Partial residual plots for continuous main effects. 


31 





In the case of PAYGRADE2, there appears to be a slight non-linearity in the 
region of the lower paygrades. As a check, PAYGRADE2 is transformed into a 
categorical variable with one level for each paygrade. This transformation has 
no appreciable effect on the model, confirming that coding paygrade as the 
continuous variable PAYGRADE2 is adequate. Note that the slopes of the lines 
in Figure 19 are the parameter estimates for the respective variables, giving an 
indication of the relative impact of each of these variables on the model. A 
positive slope indicates a favorable impact on graduation as the values for these 
variables increase. 

Analysis of the mode! structure will help to determine which variables 
have an impact on attrition. Figure 20 graphically depicts the complexity of the 


model given in Table 3. Of the main effects, 5 are significant: 


mee 
. 
~ 
’ 
. 
’ 
at 
_7 
.e! 
oe" 
7 
‘ 
. 
rd 
~ 
*., 


“ ETHNIC 3 . “AFQTatiles © 





Figure 20. Graphical representation of variables in final model. Solid ellipses represent 
significant main effects, dashed ellipses represent non-significant main effects. Significant 
interaction terms are connected with lines. 


LANCAT, PRILANCAT, MARRY, MOTIV, and SERVICE. Effects with a 


relatively high occurrence of interaction (4 or more) are: AGE, AFQTatiles, 


32 








ASVABatiles, PAYGRADE2, and SERVICE. The predictor variable of interest, 
SEX, is not significant as a main effect, however its interaction with SERVICE is 
significant. The variable which appears to have the most impact, SERVICE, is 
significant as a main effect and is part of six significant interaction terms, most 
notable in this context is SEX*SERVICE. From Chapter Il, Figure 10, we see 
that female Marines have a relatively high rate of attrition (60%). This suggests 
a possible explanation for the significance of the SEX*SERVICE interaction 
term. 

3. Without USMC Data 

A more detailed breakdown of the data is indicated. Specifically, the 
model is run again excluding all USMC observations to see if there is a change 
in the significance of the SEX*SERVICE interaction term. The iterative 
procedure described earlier in this chapter is used to reduce the model to the 
fewest possible number of predictors. Table 4 lists significant terms in the model 


run on data excluding USMC observations. Figure 21 graphically depicts the 


0.0024|DLAB*AFQTatiles 


0.0026|MARRY*AFQTatiles 
0.0026 | MARRY 


0.0063} MOTIV*ASVABatiles 





Table 4. Significant terms in the final model excluding USMC data, in decreasing order of 
significance. 


33 





information contained in Table 4. Excluding the USMC data reduces the 


complexity of the model slightly. The number of significant main effects Is 


PRILANCAT 


wen 


ETHNIC 3 


“ SERVICE 3 





Figure 21. Graphical representation of variables in final model, excluding USMC data. Solid 
ellipses represent significant main effects, dashed ellipses represent non-significant main 
effects. Significant interaction terms are connected with lines. 

reduced from 5 to 4 (MOTIV is no longer significant), while the number of 
significant interaction terms remains the same. Significant main effects include: 
LANCAT, YRSRV, MARRY, and PRILANCAT. Effects with a high frequency of 
interaction terms (4 or more) include: AGE, AFQTatiles, ASVABatiles, and 
SERVICE. The interaction term SEX*SERVICE is still significant (at a 
conservative level of significance of 0.10), although less so, with an increase In 
p-value from 0.0368 to 0.0749. The SEX*SERVICE interaction term was not 
affected greatly by controlling for USMC students, probably due to the relatively 
low weighting of USMC observations, which constitute only 5% (100 
observations) of the data. Of all USMC observations, only 5% (5 observations) 
are female. In fact, the 60% (3 out of 5 observations) USMC female attrition 


rate has a standard error of 21%. 


34 


To further control for the effects of SERVICE, additional runs are 
performed on individual service groups. Computational problems arise when 
there are too many variables in a model, relative to the number of observations. 
Army and Air Force data are run individually, with 62% and 21% of the students, 
respectively. Navy and USMC data are not run individually, because they do not 
constitute a large enough proportion of the data to provide useful results (12% = 
250 observations and 5% = 100 observations, respectively). 

4. Army Data Only 

Table 5 lists significant terms in the final model run for Army data, and 
Figure 22 provides a graphical representation of the information contained in 


Table 5. When compared to results from the model run including ail data, there 


i 7 Ni TS Si a | i —————————— 


[reRM-———sdPewatue = ————STERM——sdPevalue 


Table 5. Significant terms in the final model including only Army data, in decreasing order of 
significance. 





















is a reduction in the total number of significant terms from 25 to 18, with a 
reduction in the number of significant main effects from 5 to 4. Significant main 
effects include: LANCAT, YRSRV, MARRY, and PRILANCAT. SEX is not a 
significant predictor variable. Terms with a high frequency of interaction (4 or 
more) include: LANCAT, AGE, and AFQTaqtiles. From Figure 22 we see a 
visible reduction in the overall complexity of the model for Army data only, as 


compared to the model run on ail data. 


35 











/ 
~ 
. 


ETHNIC 3 


- PAYGRADE2 | 


t., 
“ 


“SERVICE 3 





Figure 22. Graphical representation of variables in final model, including only Army data. Solid 
ellipses represent significant main effects, dashed ellipses represent non-significant main effects. 
Significant interaction terms are connected with lines. 


5. Air Force Data Only 
The next run was done on data including only Air Force students. Table 6 


lists significant terms in this model. The smaller, less variable data set including 


Fn LL STD 1 SSS SSS SST sh S75 


SOOO OOOO III III ILIA KAD 
‘ane aaa eneaate teeta ceatat tty Paseseeahetahctatatetatets NER RIO 





Table 6. Significant terms in the final mode! ncnaia only Air Force dala: in aEGeESIG order of 
significance. 

only Air Force students yields a much simpler model, resulting in only 9 
significant terms, of which 5 are main effects: LANCAT, DLAB, MARRY, 
PAYGRADE2, and SEX. Of the 4 interaction terms, MARRY and AFQTatiles are 
each involved in two. Most important is the fact that this group of observations 
results in the only occurrence of SEX as a significant main effect. The presence 


of SEX as a significant main effect for Air Force data probably explains the 


36 








significance of the SEX*SERVICE interaction term for all data including Air 
Force observations. Figure 23 graphically depicts the relative simplicity of this 
model. 

To confirm the suspicion that Air Force data causes the SEX*SERVICE 
interaction term to be significant, the model is run on the complete data set 
excluding Air Force observations. The SEX*SERVICE interaction term becomes 


highly insignificant, with a p-value of 0.3992. 


7 
“me, 





Figure 23. Graphical representation of variables in final model, including only Air Force data. 
Solid ellipses represent significant main effects, dashed ellipses represent non-significant main 
effects. Significant interaction terms are connected with lines. 


To explore possible reasons why Air Force data might have this effect, a 
comparison is made between Air Force females and all other females in the data 
set. Key variables from each predictor block (demographic, language specific, 
and test scores) were chosen for comparison: SEX, AGE, YRSERV, 
PAYGRADE, LANCAT and DLAB. Females account for 35% of Air Force 
observations, compared to 24% for all other observations. There is no 
appreciable difference between groups for AGE, YRSERV, and DLAB. Aijr 


3/ 





Force females, however, are heavily weighted toward the more junior, ‘high risk 
paygrades, with 95% of Air Force females in paygrades E-3 and below, 
compared to 73% for all other females. 100% of Air Force females who attrited 
are from paygrades E-3 and below, compared to 84% for all other females. 
Language category distributions also differ, with 56% of Air Force females in the 
more difficult Category IV languages, compared to 45% for all other females. 
60% of Air Force females who attrited are from Category IV languages, 
compared to 47% for all other females. These facts do not suggest that Air 
Force females are attriting more than their male counterparts due to their 
gender. In fact, for this model, equation (3.2) yields an estimated parameter 
value for the predictor variable SEX (with SEX coded as 0,1 for males and 
females, respectively) of approximately 0.50, with a standard error of 0.21. The 
positive value for this estimated parameter suggests the following; given exactly 
the same attributes (e.g., paygrade E-3 and below) a given female is likely to 
perform no worse than a male. For example, the probability of attrition for Air 
Force males in paygrade E-3 and below is 37%, compared to 36% for Air Force 
females. For paygrades E-4 and above, the probabilities are 16% and 0%, 
respectively. This is not inconsistent with the attrition statistics depicted In 
Figure 10, it merely underscores the fact that Air Force females tend to be In 
‘higher risk’ paygrades. 

6. Gender as Response Variable 

An additional, ess complex model is constructed to provide a different 
perspective on the problem. For this model, including only main effects, the 
roles of SEX and GRAD/ATTR are reversed, i.e., SEX is the response variable 
and GRAD/ATTR is used as a predictor. This is done to see if there is any 
change in the relationship between gender and attrition when viewed from this 
reverse ‘angle’. If GRAD/ATTR is a significant predictor of SEX, then inferences 


can be made about the nature of the relationship between the variables. Table / 


38 








lists all predictor variables in the model, with their associated p-values. There is 


PAYGRD 0.0248) PRILANCAT 0.5155 


Table 7. Terms in the final model with SEX as response, in decreasing order of significance. 





a high degree of significance for variables related to test scores, and for branch 
of service. The p-value for GRD/ATTR is 0.4496, indicating that this variable is 


a highly insignificant predictor of gender. 
Chapter IV contains conclusions based on the results of the analysis 


conducted in this chapter, along with recommendations for further study. 


39 


40 











IV. RESULTS/CONCLUSIONS 


This chapter summarizes results from Chapter Ill, and makes inferences 
about significant variables in the various models. Recommendations are made 


about areas which lend themselves to further study. 
A. RESULTS 


A final model is constructed for each of the five categories below. The 
models grow progressively simpler as the data groups become smaller and more 
homogeneous. Table 8 summarizes the results of the final model for each of the 
data groups included. Listed is whether the variable is significant as a main 


effect, and how many interaction terms it is involved in. 


WARIARIE  lAll DATA INO USNC IARMY ONLY llAiR FORCE ONLYIISEX AS RESPONSE! 


VARIABLE —_sJALL DATA INO USMC_[[ARMY ONLY [AIR FORCE ONLY [SEX AS RESFUNSE| 
















MARRY Yee 
wove Ne (NO-NO SON 
PRILANGAT lve eye iNOS 
AcE INS INS NANI 
ETHNIC [Nn INA INONOSSCS 
AFaTatiles INS (WS INS IN@COd 
ASVABGties [N4 INA INA INO 
EDUYReup |N2IN2 (NB INo SCS 
service [ye INS INA NAY 
PAYGRADEZ INA (NB (NBA 
sex Nt nwo iyo ___—_—=«(GROVATTR=N 


Table 8. Variables in final model for each data group. Listed is significance as main 
effect/number of interaction terms the variable is involved in. 


For the model including all of the data, there are 5 significant main 
effects. Variable blocks with the highest frequency of significant variables, either 


as main effects or interaction, are: demographics (5), language specific 


41 








variables (2), and test scores (2). Service branch is the single most involved 
effect: it is significant as a main effect and is involved in 6 interaction terms. The 
predictor of interest, SEX, is not significant as a main effect, but its interaction 
with SERVICE is a significant effect (p-value = .0368). 

To control for apparent anomalies in the attrition statistics for female 
Marines, the data are broken into smaller groups. The model is run on all data, 
excluding USMC observations, to see if the interaction term SEX*SERVICE 
remains significant. Controlling for the USMC data does not eliminate the 
interaction of SEX*SERVICE as a significant effect, although its p-value Is 
increased from 0.0368 to 0.0749. Although removing USMC observations 
removes SERVICE as a significant main effect, it is still significant in several! 
interactions. The model is not very sensitive to the exclusion of USMC data due 
to the small number (5) of female Marines in the data set. 

To further investigate the effects of branch of service on attrition, 
additional runs are made on the Army and Air Force data seperately. There are 
too few observations for the other services (Navy and USMC) to allow fitting the 
model with all of the predictor variables. 

For the model run on Army data only, there are 4 significant main effects. 
Variable blocks with the highest degree of involvement in significant effects 
include: demographics (3), language specific variables (2), and test scores (1). 
The predictor of interest, SEX, is not significant as a main effect or interaction 
term. 

For the model run on USAF data only, there are 5 significant main effects. 
This is the only data group in which SEX is a significant main effect. The 
presence of SEX as a significant main effect for the Air Force data leads to the 
conclusion that the Air Force observations cause the significance of the 
SEX*SERVICE interaction term in models including Air Force data. A model run 


on all data, excluding Air Force observations, supports this conclusion since the 


42 








SEX*SERVICE interaction term becomes highly insignificant. Further analysis 
reveals attributes in which Air Force females differ from other females among the 
entire data set. Females account for 35% of all Air Force data, compared to 
24% for the other services as a whole. Other areas in which Air Force females 
differ are language category and paygrade. 56% of Air Force females are in the 
most difficult language category (IV) compared to 45% for all other females. 
Also, 95% of Air Force females are in the ‘higher risk’ paygrades of E-3 and 


below, compared to /3% for all other females. 
B. CONCLUSIONS 


In summary, gender is a significant main effect for the model run on Air 
Force subjects only, and it is a significant interaction term for the full data set 
and the data excluding USMC observations. A model run on all data, excluding 
Air Force observations, supports the conclusion that the Air Force subjects 
cause the significance of the SEX*SERVICE interaction term in the other 
models. 

This study indicates that Air Force females do not attrite more frequently 
than their male counterparts due to their gender: in fact, compared to Air Force 
males with identical attributes (e.g., the same paygrade group) Air Force females 
have similar (or smaller) attrition rates. The higher overall attrition rate for Air 
Force females is mostly due to their relatively high proportions in lower 
paygrades and more difficult language categories. 

With the exception of the model in which SEX is the response, the 
language specific variables, LANCAT and PRILANCAT, consistently outperform 
other variable blocks, followed closely by demographic variables and assorted 
test scores. The significance of the block of demographic variables is consistent 
with the findings of the Language Skill Change Project referenced in Chapter |. 

A final conclusion is that higher attrition rates for females do not appear to 


be attributable to their gender. Instead, particularly in the case of Air Force 


43 


females (the group having the largest gender impact on the attrition model), the 
comparatively higher attrition rates are considered to be a function of relatively 
high proportions of females in ‘higher risk’ groups such as junior paygrades and 
more difficult language categories. 

C. RECOMMENDATIONS FOR FURTHER STUDY 

There are two areas which lend themselves to further study. First, the 
apparent impact of gender on attrition for Air Force students suggests that a 
more in depth analysis of Air Force students be conducted to further explore the 
causes for the significant relationship between gender and attrition for these 
students. 

Second, a more detailed exploration of why students fail to graduate is 
indicated. Specifically, there appears to be an imbalance in these reasons for 
males and females. From Chapter |, recall that females attrite overall at a higher 
rate than males. However, attrition for academic reasons is much higher for 
males. 'Reason Out' data, as it is currently collected at DLIFLC, is broken into 
the following categories: Currently Enrolled, Academic, Physical Fitness, Lack 
of Effort, Over Weight, Medical, Discipline, Unit Recall, Security Clearance, and 
Other. 

Excluding the Currently Enrolled and Academic categories, there is a 
relatively high use of the 'Other’ category (approximately 15% overall). This 
appears to be at the expense of the remaining categories, suggesting a possible 
overuse of the 'Other' category. Overuse of the 'Other' category may result in 
the loss of information as to the true reason for some student losses. It would be 
useful to determine if this is in fact the case, and to correct the category 
assignment procedures, if necessary. This measure would facilitate a further 


analysis of the various reasons behind student attrition. 


44 





LIST OF REFERENCES 


Agresti, A., Categorical Data Analysis, John Wiley & Sons, 1990. 

Begley, S., "Grey Matters", Newsweek, 27 March 1995. 

Collett, D., Modelling Binary Data, Chapman & Hall, 19971. 

Directorate for Academic Administration, DLIFLC Program Summary, 1995. 
Dove, M., Defense Manpower Data Center, telephone interview, July, 1996. 
Hamilton, L., Regression With Graphics, Duxbury Press, 1991. 

O'Mara, F., Learning Skills Change Project Executive Summary, DLIFLC, 1994. 
O'Mara, F., Learning Skills Change Project Report I/, DLIFLC, 1994. 

Rice, J., Army Linguist Personnel Study, DLIFLC, 1975. 


Shaw, V., Christie, E., DLIFLC-DMDC Student Data Base Documantation, 
DLIFLC, 1994. 


45 





46 





INITIAL DISTRIBUTION LIST 


. Defense Technical Information Center 
8725 John J. Kingman Road., Ste 0944 
Ft. Belvoir, VA 22060-6218 


. Dudley Knox Library 

Naval Postgraduate School 
411 Dyer Rd. 

Monterey, CA 93943-5101 


. Lyn Whitaker 

Department of Operations Research 
Naval Postgraduate School 
Monterey, CA 93943-5101 


. Sea Control Squadron Fourty One 
PO BOX 357098, N.A.S. North Island 
san Diego, CA 92135 - 7098 

ATTN: LT George Arthur, USN 


. Stephen Payne 

Command Historian 

Defense Language Institute Foreign Language Center 
Presidio of Monterey, CA 93944-5006 


. Jonn Lett 

Director, Research and Analysis Division 

Defense Language Institute Foreign Language Center 
Presidio of Monterey, CA 93944-5006 


. ASO Library 


Defense Language Institute Foreign Language Center 
Presidio of Monterey, CA 93944-5006 


47 


Number of Copies 


2 


